NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The landscape of somatic mutations in lymphoblastoid cell lines

https://doi.org/10.1016/j.xgen.2023.100305

Caballero, Madison; Koren, Amnon (June 2023, Cell Genomics)

Somatic mutations have important biological ramifications while exerting substantial rate, type, and genomic location heterogeneity. Yet, their sporadic occurrence makes them difficult to study at scale and across individuals. Lymphoblastoid cell lines (LCLs), a model system for human population and functional genomics, harbor large numbers of somatic mutations and have been extensively genotyped. By comparing 1,662 LCLs, we report that the mutational landscape of the genome varies across individuals in terms of the number of mutations, their genomic locations, and their spectra; this variation may itself be modulated by somatic trans-acting mutations. Mutations attributed to the translesion DNA polymerase η follow two different modes of formation, with one mode accounting for the hypermutability of the inactive X chromosome. Nonetheless, the distribution of mutations along the inactive X chromosome appears to follow an epigenetic memory of the active form.
more » « less
Full Text Available
The evolution of the human DNA replication timing program

https://doi.org/10.1073/pnas.2213896120

Bracci, Alexa N.; Dallmann, Anissa; Ding, Qiliang; Hubisz, Melissa J.; Caballero, Madison; Koren, Amnon (March 2023, Proceedings of the National Academy of Sciences)

DNA is replicated according to a defined spatiotemporal program that is linked to both gene regulation and genome stability. The evolutionary forces that have shaped replication timing programs in eukaryotic species are largely unknown. Here, we studied the molecular causes and consequences of replication timing evolution across 94 humans, 95 chimpanzees, and 23 rhesus macaques. Replication timing differences recapitulated the species’ phylogenetic tree, suggesting continuous evolution of the DNA replication timing program in primates. Hundreds of genomic regions had significant replication timing variation between humans and chimpanzees, of which 66 showed advances in replication origin firing in humans, while 57 were delayed. Genes overlapping these regions displayed correlated changes in expression levels and chromatin structure. Many human–chimpanzee variants also exhibited interindividual replication timing variation, pointing to ongoing evolution of replication timing at these loci. Association of replication timing variation with genetic variation revealed that DNA sequence evolution can explain replication timing variation between species. Taken together, DNA replication timing shows substantial and ongoing evolution in the human lineage that is driven by sequence alterations and could impact regulatory evolution at specific genomic sites.
more » « less
Full Text Available
Welcome to the big leaves: Best practices for improving genome annotation in non‐model plant genomes

https://doi.org/10.1002/aps3.11533

Vuruputoor, Vidya S.; Monyak, Daniel; Fetter, Karl C.; Webster, Cynthia; Bhattarai, Akriti; Shrestha, Bikash; Zaman, Sumaira; Bennett, Jeremy; McEvoy, Susan L.; Caballero, Madison; et al (July 2023, Applications in Plant Sciences)

Abstract PremiseRobust standards to evaluate quality and completeness are lacking in eukaryotic structural genome annotation, as genome annotation software is developed using model organisms and typically lacks benchmarking to comprehensively evaluate the quality and accuracy of the final predictions. The annotation of plant genomes is particularly challenging due to their large sizes, abundant transposable elements, and variable ploidies. This study investigates the impact of genome quality, complexity, sequence read input, and method on protein‐coding gene predictions. MethodsThe impact of repeat masking, long‐read and short‐read inputs, and de novo and genome‐guided protein evidence was examined in the context of the popular BRAKER and MAKER workflows for five plant genomes. The annotations were benchmarked for structural traits and sequence similarity. ResultsBenchmarks that reflect gene structures, reciprocal similarity search alignments, and mono‐exonic/multi‐exonic gene counts provide a more complete view of annotation accuracy. Transcripts derived from RNA‐read alignments alone are not sufficient for genome annotation. Gene prediction workflows that combine evidence‐based and ab initio approaches are recommended, and a combination of short and long reads can improve genome annotation. Adding protein evidence from de novo assemblies, genome‐guided transcriptome assemblies, or full‐length proteins from OrthoDB generates more putative false positives as implemented in the current workflows. Post‐processing with functional and structural filters is highly recommended. DiscussionWhile the annotation of non‐model plant genomes remains complex, this study provides recommendations for inputs and methodological approaches. We discuss a set of best practices to generate an optimal plant genome annotation and present a more robust set of metrics to evaluate the resulting predictions.
more » « less
Full Text Available
Replication timing analysis in polyploid cells reveals Rif1 uses multiple mechanisms to promote underreplication in Drosophila

https://doi.org/10.1093/genetics/iyab147

Das, Souradip; Caballero, Madison; Kolesnikova, Tatyana; Zhimulev, Igor; Koren, Amnon; Nordman, Jared (September 2021, Genetics)
Bateman, J (Ed.)
Abstract Regulation of DNA replication and copy number is necessary to promote genome stability and maintain cell and tissue function. DNA replication is regulated temporally in a process known as replication timing (RT). Rap1-interacting factor 1 (Rif1) is a key regulator of RT and has a critical function in copy number control in polyploid cells. Previously, we demonstrated that Rif1 functions with SUUR to inhibit replication fork progression and promote underreplication (UR) of specific genomic regions. How Rif1-dependent control of RT factors into its ability to promote UR is unknown. By applying a computational approach to measure RT in Drosophila polyploid cells, we show that SUUR and Rif1 have differential roles in controlling UR and RT. Our findings reveal that Rif1 acts to promote late replication, which is necessary for SUUR-dependent underreplication. Our work provides new insight into the process of UR and its links to RT.
more » « less
Full Text Available
A Reference Genome Sequence for Giant Sequoia

https://doi.org/10.1534/g3.120.401612

Scott, Alison D; Zimin, Aleksey V; Puiu, Daniela; Workman, Rachael; Britton, Monica; Zaman, Sumaira; Caballero, Madison; Read, Andrew C; Bogdanove, Adam J; Burns, Emily; et al (November 2020, G3 Genes|Genomes|Genetics)

Abstract The giant sequoia (Sequoiadendron giganteum) of California are massive, long-lived trees that grow along the U.S. Sierra Nevada mountains. Genomic data are limited in giant sequoia and producing a reference genome sequence has been an important goal to allow marker development for restoration and management. Using deep-coverage Illumina and Oxford Nanopore sequencing, combined with Dovetail chromosome conformation capture libraries, the genome was assembled into eleven chromosome-scale scaffolds containing 8.125 Gbp of sequence. Iso-Seq transcripts, assembled from three distinct tissues, was used as evidence to annotate a total of 41,632 protein-coding genes. The genome was found to contain, distributed unevenly across all 11 chromosomes and in 63 orthogroups, over 900 complete or partial predicted NLR genes, of which 375 are supported by annotation derived from protein evidence and gene modeling. This giant sequoia reference genome sequence represents the first genome sequenced in the Cupressaceae family, and lays a foundation for using genomic tools to aid in giant sequoia conservation and management.
more » « less
Full Text Available

Search for: All records